Bibliography

183

[55] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.

Bert: Pre-

training of deep bidirectional transformers for language understanding. In NAACL-

HLT, 2019.

[56] Ruizhou Ding, Ting-Wu Chin, Zeye Liu, and Diana Marculescu.

Regularizing ac-

tivation distribution for training binarized deep networks.

In Proceedings of the

IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11408–

11417, 2019.

[57] Ruizhou Ding, Zeye Liu, Rongye Shi, Diana Marculescu, and RD Blanton. Lightnn:

Filling the gap between conventional deep neural networks and binarized networks.

In Proceedings of the on Great Lakes Symposium on VLSI 2017, pages 35–40, 2017.

[58] Paul Adrien Maurice Dirac. The physical interpretation of the quantum dynamics.

Proceedings of the Royal Society of London. Series A, Containing Papers of a Math-

ematical and Physical Character, 113(765):621–641, 1927.

[59] Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W Mahoney, and

Kurt Keutzer. Hawq-v2: Hessian aware trace-weighted quantization of neural net-

works.

In Neural Information Processing Systems(NeurIPS), pages 18518–18529,

2020.

[60] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua

Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold,

Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recogni-

tion at scale. arXiv preprint arXiv:2010.11929, 2020.

[61] Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy,

and Dharmendra S Modha.

Learned step size quantization.

arXiv preprint

arXiv:1902.08153, 2019.

[62] Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew

Zisserman. The pascal visual object classes (voc) challenge. International journal of

computer vision, 88(2):303–338, 2010.

[63] Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel M Roy, and Ali

Ramezani-Kebrya. Adaptive gradient quantization for data-parallel sgd. Advances in

neural information processing systems, 33:3174–3185, 2020.

[64] Angela Fan, Edouard Grave, and Armand Joulin. Reducing transformer depth on

demand with structured dropout. arXiv preprint arXiv:1909.11556, 2019.

[65] Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, R´emi Gribonval, Herve

Jegou, and Armand Joulin. Training with quantization noise for extreme model com-

pression. arXiv preprint arXiv:2004.07320, 2020.

[66] Pedro Felzenszwalb and Ramin Zabih. Discrete optimization algorithms in computer

vision. Tutorial at IEEE International Conference on Computer Vision, 2007.

[67] Yoav Freund, Robert E Schapire, et al. Experiments with a new boosting algorithm.

In icml, volume 96, pages 148–156. Citeseer, 1996.

[68] D. Gabor.

Electrical engineers part iii: Radio and communication engineering, j.

Journal of the Institution of Electrical Engineers - Part III: Radio and Communication

Engineering 1945-1948, 1946.